Antipattern: One Size Fits All
Let's see some caveats of using database keys traditionally.
Books, articles, and programming frameworks have established a cultural convention that every database table must have a primary key column with the following characteristics:
-
The primary key’s column name is
id
. -
Its data type is a 32-bit or 64-bit integer.
-
Unique values are generated automatically.
The presence of a column named id
in every table is so common that this has become synonymous with a primary key. Programmers learning SQL get the false idea that a primary key always means a column defined in this manner.
Adding an id
column to every table causes several effects that make its use seem arbitrary.
Making a redundant key#
We may sometimes see an id
column defined as the primary key simply for the sake of tradition, even when another column in the same table could be used as the natural primary key. The other column may even be defined with a UNIQUE
constraint. For example, in the Bugs
table, we could label bugs using a string with a mnemonic for the project the bug belongs to or other identifying information.
Let’s run the query in the following playground to see the effect on the database.
The bug_id
column in the example has similar usage to id
in that it serves to identify each row uniquely.
Allowing duplicate rows#
A compound key consists of multiple columns. One typical use for a compound key is in an intersection table like BugsProducts
. The primary key should ensure that a given combination of values for bug_id
and product_id
appears only once in the table, even though each value may appear many times in different pairings.
However, when we use the mandatory id
column as the primary key, the constraint no longer applies to two columns that should be unique.
Let’s try to run the code in the playground below.
We’ll find that there are 3 records for the same
bug_id
and theproduct_id
. This means that these records are duplicated.
Duplicates in this intersection table cause unintended results (because of duplicated records) when we use the table to match Bugs
with Products
. To prevent duplicates, we can declare a UNIQUE
constraint over the two columns besides id
:
But if we need a unique constraint over those two columns anyway, the id column is superfluous.
Obscuring the meaning of the key#
The word “code” has a number of definitions, one of which is a way to communicate a message with brevity or secrecy. In programming, we should have the opposite goal — to make the meaning clearer.
The name id
is so generic that it holds no meaning. This is especially important when we join two tables, and they have the same primary key column name.
How do you tell the bug id from the account id in your application code if you reference columns by name instead of by ordinal position? This is a problem, especially in dynamic languages like PHP, when a query result is an associative array: one column overwrites the other unless you specify column aliases in your query.
The name of the id
column doesn’t help make the query any clearer. If, instead, the columns were named bug_id
and account_id
, the reader would have a much easier time reading the query results. We use a primary key to address individual rows of a table, so the column’s name should give a clue about the type of entity in that table.
Using USING
#
A lot of us are probably already familiar with the SQL syntax for a join — using the keywords JOIN
and ON
before an expression to evaluate matching rows in the two tables.
Let’s try to run this query in the next playground.
SQL also supports a more concise syntax for expressing a join between two tables. We can rewrite the previous query in the following way if the columns have the same name in both tables:
However, if all tables are required to define a pseudo key as a primary key named id
, then a foreign key column in a dependent table can never use the same name as the primary key it references. Instead, we must always use the more verbose ON
syntax:
Compound keys are hard#
Some developers refuse to use compound keys because they feel that these keys are too hard to use. Any expression that compares a key to another must compare all columns. A foreign key that references a compound primary key must itself be a compound foreign key. It requires more typing to use compound keys.
This refusal is like a mathematician refusing to use two-dimensional or three-dimensional coordinates, and insisting on performing all calculations as though objects exist within a one-dimensional, linear space. It’s true that this would make a lot of geometry and trigonometry much simpler, but it fails to describe real-world objects that we need to work with.